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Abstract — Measuring network flow sizes is important for 
tasks like accounting/billing, network forensics and security. 
Per-flow accounting is considered hard because it requires that 
many counters be updated at a very high speed; however, 
the large fast memories needed for storing the counters are 
prohibitively expensive. Therefore, current approaches aim to 
obtain approximate flow counts; that is, to detect large elephant 
flows and then measure their sizes. 

Recently the authors and their collaborators have developed 
[1] a novel method for per-flow traffic measurement that is 
fast, highly memory efficient and accurate. At the core of this 
method is a novel counter architecture called "counter braids." 
In this paper, we analyze the performance of the counter 
braid architecture under a Maximum Likelihood (ML) flow 
size estimation algorithm and show that it is optimal; that is, 
the number of bits needed to store the size of a flow matches 
the entropy lower bound. While the ML algorithm is optimal, 
it is too complex to implement. In [1] we have developed an 
easy-to-implement and efficient message passing algorithm for 
estimating flow sizes. 

I. Introduction 

This paper addresses a theoretical problem arising in a 
novel approach to network traffic measurement the authors 
and their collaborators have recently developed. We refer 
the reader to [1] for technological background, motivation, 
related literature and other details. In order to keep this paper 
self-contained, we summarize the background and restrict the 
literature survey to what is relevant for the results of this 
paper. 

Background. Measuring the sizes of network flows on high 
speed links is known to be a technologically challenging 
problem [2]. The nature of the data to be measured is as 
follows: At any given time several 10s or 100s of thousands 
of flows can be active on core Internet links. Packets arrive 
at the rate of one in every 40-50 nanoseconds on these 
links which currently run at 10 Gbps. Finally, flow size 
distributions are heavy-tailed, giving rise to the well-known 
decomposition of flows into a large number of short "mice" 
and a few large "elephants." As a rule of thumb, network 
traffic follows an "80-20 rule": 80% of the flows are small, 
and the remaining 20% of the large flows bring about 80% 
of the packets or bytes. 

This implies that measuring flow sizes accurately requires 
a large array of counters which can be updated at very 
high speeds, and a good counter management algorithm for 
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updating counts, installing new counters when flows initiate 
and uninstalling them when flows terminate. 

Since high-speed large memories are either too expensive 
or simply infeasible with the current technology, the bulk 
of research on traffic measurement has focused on approxi- 
mate counting methods. These approaches aim at detecting 
elephant flows and measuring their sizes. 
Counter braids. In [1] we develop a novel counter archi- 
tecture, called "counter braids", which is fast, very efficient 
with memory usage and gives an accurate measurement of 
all flow sizes, not just the elephants. We will briefly review 
this architecture using the following simple example. 

Suppose we are given 5 numbers and are told that four of 
them are no more than 2 bits long while the fifth can be 8 
bits long. We are not told which is which! 

Figures Q] and |2] present two approaches for storing the 
values of the 5 numbers. The first one corresponds to a 
traditional array of counters, whereby the same number of 
memory registers is allocated to each measured variable 
(flow). The structure in Fig. [2] is more efficient in mem- 
ory, but retrieving the count values is less straightforward, 
requiring a flow size estimation algorithm. 
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Fig. 1. A simple counter structure: to each flow size we associate 
its binary representation (filled circle = 1, empty circle = 0). 




Fig. 2. Counters braid. 

Viewed from an information-theoretic perspective, the 
design of an efficient counting scheme and a good flow size 
estimation is equivalent to the design of an efficient source 



code [3]. However, the applications we consider impose 
a stringent constraint on such a code: each time the size 
of a flow changes (because a new packet arrives) a small 
number of operations must be sufficient to update the stored 
information. This is not the case with standard source codes, 
where changing a single letter in the source stream may alter 
completely the compressed version. 

In this paper we prove that, under a probabilistic model 
for the flow sizes (namely that they form a vector of iid 
random variables), counter braids achieve a compression 
rate equal to the entropy of the flow sizes distribution, in 
the large system limit. Namely, for any rate larger than 
the flow entropy, the flow sizes can be recovered from the 
counter values, with error probability vanishing in the large 
system limit. Further, we prove optimal compression can be 
achieved by using braids that are sparse. The result is non- 
obvious, since counter braids form a pretty restrictive family 
of architectures. 

Our treatment makes use of techniques from the theory of 
low-density parity check codes, and the whole construction 
is inspired by that of LDPC [4], [5]. The construction of 
LDPC codes has an analogy in the source coding problem 
thanks to standard equivalence between coding over discrete 
memoryless symmetric channels, and compressing iid dis- 
crete random variables [6]. However, the key ideas in the 
present paper have been developed to deal with the problem 
that the flow sizes are a priori unbounded. In the channel 
coding language, this would be equivalent to use a countable 
but infinite input alphabet. 

Finally, we insist on using sparse braids for two reasons. 
First, this allows the stored values to be updated with a small 
(typically bounded) number of operations. Second, it is easy 
to realize that ML decoding of counter braids is NP-hard, 
since it has ML decoding of linear codes as a special case 
[7]. However, thanks to the sparseness of the underlying 
graph, one may use iterative message passing techniques [8]. 
Indeed, a simple message passing algorithm for estimating 
flow sizes is described and analyzed using real and synthetic 
network traces in [1]. 

II. Counter Braids: Basic definitions 

Definition 1. A counter braid is a couple (G, q) where q > 2 
is an integer (register capacity) and G is a directed acyclic 
graph on vertex sets I (input nodes) and R (registers), 
with the input nodes having in-degree zero. We write G 
I . R, E), with E the set of directed edges in G. 
For any node i E I U R, we will denote by d + i = {j : 
E E} the set of descendants of i, and by d-i = {j : 
(j, i) E E} the set of parents of i. Finally di = d+ U d-i. 

In the following we shall often omit the explicit reference 
to the register capacity and write G for (G,q). The input 
size of the braid is |/| = n, and its storage size \R\ ee to. 
An important parameter is its rate, which we measure in bits 
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We will say that a sequence of counters braids {G n = 
(I n , R n , E n )} is sparse if the number of edges per input 
node |S„|/|/ n | is bounded. 

Definition 2. A state (or configuration of the counter braid 
G q , with is an assignment (x,y) of non-negative integers to 
the nodes in G, with x — {xi : i E 1} E N 7 , and y = {ijj : 
j N R . The state (x, y) is valid ifyj E {0, . . . , q- 1} 

for any register j E R. 

Notice that a valid register configuration can be regarded 
as an element of (Z q ) H (where Z g is the group of integers 
modulo q.) We denote by the zero vector in N K . 

We want now to describe the braid behavior when one 
of the input nodes is incremented by one unity (i.e. when a 
packet arrives at input node i E I.) Assume the braid (G, q) 
to be in a valid state (x, y). Given i E 7, we define the new 
state (x',y') — Ti(x,y) by letting x ■ = Xi + 1, x'j = xj 
for any j ^ i, and y' be defined by the following procedure. 
Notice that this definition is ambiguous in that we did not 



REGISTERS UPDATE (INPUT: flow index i) 



1: 


Vj(°) =Vj for 3 <^d + i, 




and yj (0) = yj + 1 otherwise. 


2: 


Set t = 0. 


3: 


while y(t) is not valid 


4: 


Let j E R be such that yj (t) > q; 


5: 


Sety j (t+l) = y j (t)-q; 


6: 


For any I e d + j, set y t (t + 1) = yi(t) + 1; 


7: 


For any / e R \ {j, d+j}, set y t (t + 1) = 


8: 


Increment t := t + 1; 


9: 


end 


10: 


return y(t). 



specify which register to pick among the ones with yj (t) > q 
at step 4 in the registers update routine. However this is not 
necessary, as stated in the following lemma (the proof is 
omitted from this extended abstract). 

Lemma 1. The update procedure above halts after a finite 
number of steps. Further its output Ti(x, y) does not depend 
on the order of update of the registers. 

With an abuse of notation we shall write x' = Tj(x), 

y' = T i(y), when W,y') = Ti(x,y). 

When input values x are incremented sequentially, the 
stored information y is updated according to the above pro- 
cedure. From now on we shall take a static view and assume 
a certain input x. The corresponding stored information y is 
obtained through the mapping defined below. 

Definition 3. Given a counter braid (G, q), the associated 
storage function Fq : N 1 — > returns, for any input 
configuration x E N 1 a register configuration y = Fg(x) E 
1% defined as follows. Let x<® = 0, x^ x^ = x 
be a sequence of input configurations such that x^ s+1 ^ is 
obtained from x^ by incrementing its entry i(s). Then 

F G (x) ee T i(JV) o T i{N -i) ° ■ ■ ■ ° T i( i)(0) . (2) 



We shall drop the subscript G from Fq whenever clear 
from the context. A priori it is not obvious that the mapping 
Fq is well defined. In particular, it is not obvious that it 
does not depend on the order in which input values are 
incremented, i.e. on the sequence ■ ■ ■ ,i(N)}. This is 

nevertheless the case (the proof is omitted.) 

Definition 4. Given a counter braid (G, q), a reconstruction 
(or decoding) function is a function F : — > N. 

A. Main results 

Throughout this paper, we shall model the input values 
as iid integer random variables (X\, . . . ,X n ) = X (identi- 
fying V = [n]) with common distribution p. The (binary) 
entropy of this distribution will be denoted by H 2 (p) = 

Definition 5. A sequence of counters braids {G n = 
(I n , R n , E n )}, with \I n \ = n has design rate r if 



r = lim 



rwoo J„ 



■ log 2 q ■ 



(3) 



It is reliable for the distribution p if there exists a sequence 
of reconstruction functions F„ = F Gn such that, for X a 
random input and Y = Fa n (X) 



P crr (G„,F„) = P{F„(r)^X} AO. 



(4) 



Shannon's source coding theorem implies that there cannot 
exist reliable counter braids with asymptotic rate r < h 2 (p). 
However, the achievability of such rates is far from obvious, 
since counter braids are a fairly specific compression scheme. 
The main theorem of this paper establishes achievability, 
even under the restriction that the braid is sparse. 

In order to avoid technical complication, we make two 
assumptions on the input distribution p: 

1) It has at most power-law tails. By this we mean that 
¥{X, >x}< Ax- f - for some e > 0. 

2) It has decreasing digit entropy. Let Xi = 
J2 a >o Xi{a)q a be the g-ary expansion of X i7 and hi be 
the g-ary entropy of Xi(l). Then hi is monotonically 
decreasing in I for any q large enough. 

We call a distribution p with this two properties admissible. 
While this class does not cover all possible distributions, it 
is likely to include any case of practical interest. 

Theorem 1. For any admissible input distribution p, and 
any rate r > Hi (p) there exist a sequence of reliable sparse 
counter braids with asymptotic rate r. 

As stressed above, we insist on the braid being sparse for 
two reasons: (i) It allows to update the registers content y 
with a small number of operations, whenever one entry of 
x is incremented (i.e. the storage function can be efficiently 
recomputed); (ii) It opens the way to using low-complexity 
message passing algorithms for estimating the input vector x, 
given the^stored information (i.e. for evaluating the recovery 
function Fq). 



III. The architecture 

A. Layering 

We will consider layered architectures. By this we mean 
that the set of register is the disjoint union of L layers R = 
R 1 U R 2 U • • • U R L and that directed edges are either from I 
to R 1 or from R l to R t+1 for some I e {1, . . . , L - 1} (we 
shall sometimes adopt the convention R° = I). We denote 
by J/0) — {yi : i € R 1 } the vector of register values in layer 
I. We further let mi = \R l \ denote the size of the Z-th layer 
(with mo = n). 

The graph structure is conveniently encoded in L matrices 
Hi, ...Hi, whereby H; is the m; x m/_i adjacency matrix 
of the subgraph induced by R l U R 1 ^ 1 . We further let tf = 
H; -H;_i • • • Hi. The storage function F can be characterized 
as follows. 

Lemma 2. Consider an L-layers counters braid, let x be 



its input, and define the sequence of vectors z"' € N H , by 



AO) 



x and 



^(O = l(M lZ V-V)/q\ 



(5) 



(the division and floor operation being component-wise on 
the vector H;z(' -1 ^.) Then, the register values are j/(0 = 



(i-i) 



mod q. 



B. Recovery function 

We now describe the recovery function F. Since in this 
paper we are only interested in achievability, we will neglect 
complexity considerations. 

1) One layer: Let us start from a one-layer braid and 
assume the inputs to be iid with common distribution p* 
supported on {0, . . . q— 1} 9 Xi. Then the register values are 
y = Mx mod q, where H is the adjacency matrix of the 
braid. Fix 7 e (0, 1). We say that the input x € {0, . . . , q — 
1}™ is typical, and write x € T n {p*) if its type 9 X satisfies 
D (& x \\p*) < n 7 (here the Kullback-Leibler divergence is 
computed in natural base). Denote by T n (p*;y) the set of 
input vectors that are typical and such that Mx = y mod q. 
The 'typical set decoder' returns a vector x if this is the the 
unique element in T„ (p* ; y) and a standard error message 
otherwise. In formulae 
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x if J n {p*;y) = {x}, 
* if \T n (p*;y)\ + 1- 



(6) 



2) Multi-layer: Consider now a multi-layer braid and 
x E N 1 (inputs not restricted to be smaller than q) with Xi's 
distributed independently according to p. It is convenient to 
write the input vector in base q 



a>0 



x(a) q a . 



(7) 



where x(a) = {xi(a) : i € V} with Xi(a) € {0, . . . , q — 1}. 
Notice that, for each a > 0, the vector x(a) has iid entries. 
Let p a be the distribution on Xi (a) when Xi has distribution 
P- 

We'll apply typical set decoding recursively, determining 
the g-ary vectors x(0), x(l), x(2), ... in this order. Consider 



first a;(0). It is clear from Lemma [2] that j/W = Hix(O) = 
H 1 x(0) mod q. We then apply typical set decoding to the 
determination of x(Q). More precisely, we look for a solution 
of M^-x = 2/W mod q that is typical under distribution pq. 
If there is a unique such solution, we declare it our estimate 
of x(0) and denote by x(Q). Otherwise we declare an error. 

Consider now the determination of x(l) and assume the 
lower order terms in the expansion (O have already been 



estimated to be x(0), x(l), 



x(l - 1). Let 



En=o ^( fl ) an d a > 1 be determined through the 
same recursion as in Eq. (O. Further let y^ = H a z( a_1 ) 
mod q (this are nothing but the register values on input 2^°)). 
Assume the estimates x(0), x(l), . . . , x(l — 1) to be 



correct. It is then easy to realize that y 



» 



1, ...I. Further «W 



?(0 



y 



(«) f or 



Wx(l) mod q, hence 



y 



0+i) 



y 



a+i) 



+ H' +1 x(Z) mod?. 



(8) 



We therefore proceed to compute y" +1 > —y( t+1 ) mod q. If 
the linear system M L+1 x(l) = — yV+l) mod g admits 

more than one or no solution that is typical with respect to 
the distribution pi, an error is returned. Otherwise, the next 
term in the expansion (Q is estimated through the unique 
typical solution of such linear system. 

The recovery algorithm is summarized below, with one 
improvement with respect to the description above. Instead 
of recomputing z^°\ . . . ,z^ l \ at stage I we only compute 
the vector that is needed at the present stage. 

RECOVERY (INPUT: register values y) 



Initialize z (a) = for a > 0; 
for I € {0, ... L} 

Set y(l + 1) = Mi+iz 



(0 



mod q; 



Let Tj be the set of pi -typical 
solutions of M l+1 x = y( l+r > - 



y 



(i+i) 



mod q; 



If T/ = {x} let x(l) = x 
otherwise if |T/| ^ 1 return error; 

6: Set z<' +1 ) = [{M l+1 z^ + U l+1 x(l)}/q\ ; 
7: end 

8: return x = J2i < t- 

C. Sparse graph ensemble and choice of the parameters 

The optimal compression rate in Theorem [TJ is achieved 
with the following random sparse graph construction. Fix 
the registers capacity q and an integer k > 2. Then for / = 
1, . . . , L the graph induced by vertices U i?/ has a 

random edge set that is sampled by connecting each i £ Ri-i 
to k iid uniformly random vertices in Ri (all edges being 
directed from Ri-i to Ri). In other words, the m; x m;_i, 
— 1 matrix Hj has independent columns, each sampled by 
incrementing k iid positions. 

The choice of this ensemble is motivated by implementa- 
tion concerns. In the flow counting problem, we do not know 
a priori the exact number of flows that needs to be stored. 
The above structure, this number can be changed without 
modifying existing links. Further, for each new flow, the 



subset of k registers it is connected to can be chosen through 
a simple hash function. 

To these Lq stages, we add further L\ stages, all of the 
same size tol q+ i = ■ ■ • = tol +Li = m*, with edges 
connecting each node in Ri-i to a different node in Ri. 
Equivalently, we take H; to be the identity matrix in these 
stages. 

It remains to specify the number of stages Lo, L\ and 
their sizes mi,. . . , mj, . Let pi be the distribution of the Z-th 
least significant digit in the g-ary expansion of Xi. Recall 
that we defined hi to be the g-ary entropy of the distribution 
Pi, i-e. 



9-1 

-^2pi(x) \og q pi(x). 

x=0 



(9) 



Finally, in the achievability proof, we shall assume that q 
is a prime number, large enough for hi to be monotonically 
decreasing. 

Lemma 3. Assume ¥{X± > x} < Ax~ e . Then there exists 
constants B,C that only depend on A,e, such that for all 
I > 1, and all q large enough 

hi<Blq~ le , (10) 

fc 2 (p)-X>log 2 g| <Cq- e (\og 2 q) 2 . (11) 
Z>0 

The proof of this simple Lemma is deferred to Section [VTl 

Lemma 4. Let p„ be a distribution over {0, . . . , q}, with 
q-ary entropy H(p*), and T n {p*) be the set of p* -typical 
vectors defined as in Sec \III-B\ Let |T„(p*)| be the size 
of this set. Recall that x € T n {p*) if its type 9 X satisfies 
D(6 x \\p*) < n-' 1 . Then, for any f3 E (1 - 7/2,1), there 
exists A = A{(3, 7, q) such that 

\T n (p*)\<q nH{p ' )+Anf) , ■ 

Further, if X = (X\, . . . ,X n ) is a vector with iid entries 
with common distribution p* 



(12) 



In the following we will consider 7 and j3 fixed once and 
for all, for instance by 7 = 1/2 and (3 = 7/8. 

Fix some 6 > 0, and let A(q) be a suitably large constant, 
we let, for I = 1, . . . , Lq, 



(13) 

m = (1 + 6) \nhi-i(l + S) + A(q)n f:) ] . (14) 
The number of stages is such that 



m; = maxfaj, |"£m;-i~|} . 
mi 



Eh < nilogn) 2 for any I > Lq . 



(15) 



This implies, by Lemma [3] Lo = O (log log n). To this we 
add L\ — (logn) 3 / 2 stages within the second group, of size 
rn* = rtii, a < n(logn) -2 . The total number of registers 
is therefore upper bounded as \R\ < n(l + S)J2 l>Q (hi + 



Ara^- 1 )(X)i>o §i ) + n{\ogn)-^ 2 , and therefore the asymp- 
totic rate of This architecture 



r < 



E hi lo; 



g 2 <2- 



(16) 



l>0 



Since the right hand side can be made arbitrarily close to 
h(p) by Lemma |3] Theorem Q] follows from the following. 

Theorem 2. For any input distribution p with at most power- 
law tails and any choice of q > 2 and 5 > 0, there exists 
k > 2 smc/z that the multi-layer braid described above is 
reliable. 

IV. Analysis of one-layer architectures 

In order to prove our main Theorem or, equivalently, 
Theorem [2] we need first to prove a few preliminary results 
concerning a one-layer architecture. The proof here follows 
the technique of [9], the main tool being an estimate of 
the distance enumerator as in [4], [10], [11]. Distance enu- 
merators for non-binary LDPC codes have been estimated 
in [12]. Unhappily we cannot here limit ourselves to citing 
these works, because the graph ensemble is different from 
the regular ones treated there. 

Throughout this Section the source is a vector X = 
(Xi, . . . , X n ) with iid entries taking values in {0, . . . , q— 1} 
and distribution p„ (in the application to multi-layer schemes 
will coincide with pi for some / > 0). We let H be 
an m x n matrix whose columns are independent vectors 
with integer entries. Each column is obtained by choosing k 
positions independently and uniformly at random (eventually 
with repetition) and incrementing the corresponding entry by 
one. In other words, H is distributed as the adjacency matrix 
of a given layer in the multi-layer architecture. 

Our first result is a simple combinatorial calculation. Let 
A = {X z : z = 1, . . . , q — 1} be a vector in 
It is convenient to introduce the random variable W — 
{W z : z = l,...,q- 1} taking values in W^ 1 . The 
joint distribution of (Wi, . . . , W q -\) (to be denoted by P?) 
is the one of q — 1 Poisson random variables with means 
(respectively) Ai, . . . , A 9 _i, conditioned on YH=i Z W Z = 0, 
mod q. 

Lemma 5. Let x G {0, . . . , q — 1}™ be an input vector with 
n z entries equal to z, for z — 0, . . . , q — 1, and H be a 
random matrix as above. Define n — {n z : 2 = 1,..., q— 1}. 
For any A G , let Wi,...W m be m iid vectors with 

distribution Pj. Then the probability that Wx = mod q 
is 
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P{Ha; = 0} = Yl 



U=i 



\ (mXzY 



where Q(X) is the probability that Y2l=i Z U Z — 0, mod q 
for independent Poisson random variables with means X z . 



Further, for some universal constant C, and D q = 1 — 
cos27r/(7, and = ^2 z n z 

P{Ha; = 0} < (Ckn*)^-Q(kn/m) m R(m, - Vzn 2 ) 

(18) 
(19) 
(20) 



1 



Q(kn/m) < - |1 + (q - 1) e" u ' — 
R(m,N) = mm{l,(Cq 2 N/m) N } 



Proof. Due to the symmetry of the distribution of H with 
respect to permutation in its columns, P{Ha; = 0} does 
depend on x only through the number of ones, twos, etc. 
Without loss of generality we can assume the first n\ 
coordinates to be ones, the next n,2 to be twos, and so on, and 
neglect the last n — J2z n z columns, corresponding to zeros. 
Think now of filling the matrix, by choosing its non-zero 
entries (edges in the associated graph). If we associate to 
each such entry the value of the corresponding coordinate in 
x, we want the probability for the sum of labels on each row 
to be mod q. Since entries are independent and uniformly 
random, this is equal to the probability that each of m urns 
is filled with balls whose labels add to 0, when we throw 
kn\ balls labeled with 1, krii labeled with 2, and so on. It 
is an exercise in combinatorics to show that this is 



fi S# coefF > ^r^ ni • • • Cr } , 

th th-i (9-1 ^ 

h-lq-1 q U=l ) 

Equation ( TTTb is then obtained by evaluating P^ and showing 
that it yields the above combinatorial expression. 

In order to get Eq. ( TT8b . we denote P^{- ■ • } by R, and 
use X z = kn z /m, thus leading to 

Pin* = o} = n \ k z n ; )kn , QWmrR ■ 

Equation ( [T8l follows from the observation that AH < 
\J CN [N/e) N for some universal constant C. 

In order to prove Eq. ([T9] l. notice that, by discrete Fourier 
transform 



Q(A) 



, 9-1 r 9-i 



e i 



1=0 



The claim is proved by singling out the I — term and 

2ttHz 

bounding the others using Re(l — e « ) > D q . 

Let us finally prove Eq. d20b . Obviously R < 1 since 
it is an upper bound on the probability P^{- • • }. If we let 



= feV^9-l 
— q l^z=l 

of generality, that N is an integer with N / m < l/q. Let Vi 



N = | X^2=i we can therefore assume, without loss 



be distributed as Ylt=i Wi, z z/q conditioned on Vi being an 
integer. Then the probability P^{- • ■ } is upper bounded by 



X>>iV < 



. i=l 



HV t > 1} 



JV 



< 



fCrn. 
V N 



'{Vi > 1} 



Recalling the definition of Vi, we have 

P{Vi > i} = p { E zC/ * > ? E zUz = mod n 



z=l 



z=l 



- e 
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But A z = kn*/m < Nq/m < 1. Further, 

5Zz=i — 9 on ly ^ ELi ^ z — ^- Therefore we get 
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> 1} < eP|^f7 z >2| 

2 = 1 

9—1 2 

< Cr^A,) <C(fcn*/ 



The proof is completed by noticing that (fcn*/m) < 
(Nq/m). □ 

In the following, given a vector x — (xi,...,x n ), we 
shall denote by ||x||o is number of non-zero entries. 

Lemma 6. Let W be a random m x n matrix distributed as 
above, with column weight k. Assume k not to be a multiple 

of q, m < n, (m/nk) 1 ^ > A > and 3 < k < m/logm. 
Then, there exists a constant B — B(q 1 A), C = C(q,A), 
such that, if 

C m 



E < 



log(nk / m) 



then 



°{3||z|| < E : Hz = mod qX < n 2 



Bk\ " 



(21) 



(22) 



(where it is understood that z £ {0, . . . , q — l} n .) 

Proof. Throughout the proof, A will denote a generic con- 
stant depending only on q that can be chosen large enough 
to make the inequalities below hold. 

Let z e {0, . . . , q - 1}" be such that ||z|| = £. We will 
upper bound the probability that Hz = mod q in different 
ways depending whether £ < Eq or £ > Eq, where 



Eq = p(q) 



m /m\ 2 /( fe - 2 ) 



(23) 



k \nkJ 

with p(q) a function to be determined. Notice that, under our 
hypotheses, 



^ > P ( 9 )A 2fe /( fe - 2 ) 

Tfl 



(24) 



is bounded away from (as 2 < 2k/ (k — 2) < 6 for k > 3.) 



For ||z||o = £ < -Bo (and z ^ 0) we use Lemma [5] 
Eq. ( fT8l ), where we set Q( ■ 



< 1, n* 



and 



< 



■| zn z — hi- Further we assumed Ak£/m < 1, which 

holds without loss of generality if we take p(q) < 1/AA 6 < 
1 ^ A 2fc/(fe-2) in Eq ( |23j j thus gettin g 

P{Hz = 0} < (AM)^ L {AM/m) u/q . (25) 

Since (H)( 9-1 )/ 2 < A u / q , we have (by properly adjust- 
ing A) 

P{Hz = 0} < (AM/m) M,q . (26) 

For ||z|| > Eq, we use Eq. (18) with R(- ■ ■ ) < 1. Since 
k£/m > kEo/m is bounded away from by Eq. d24l i. 
we have £?(•••) < e~ c for some C = C(A, q) > and 
therefore 

P{Hz = 0} < {Aktj^-^' 2 e~ Cm . (27) 

There are at most (") (q — I) 1 < (4p) vectors z with 
\\z\ |o = t- If we denote by P_b 1j e 2 the probability of the event 
|3z : Ei < \\z\\ < E 2 , Hz = mod qX, the probability 
in Eq. d22b is upper bounded by P2,e +P_b ,b (notice that if 
k is not a multiple of q, Hz = is impossible for ||z||o = 1). 
By union bound we have 

'Any ^Akf_y' /q 



2, En 



< 



E 



1=1 



< 



An\ 2 (2Ak\ 2k/q ^ 



\ m 



1=1 



where (using (£/2) 2k /i~ 2 < A k( - 1 ^^ and eventually ad- 
justing the constant A) 

For £ < Eq, and choosing p(q) small enough in Eq. d23l . we 
obtain £,(£) < 1/2 thus leading to P 2 ,_e < n 2 (Ak/m) k l q . 

Finally consider the contribution of vectors ||z||o > Eq. 
Proceeding as above, we have 



■ En.E 



< 



E 

E 

l=E 



An 



{AM) 



(<?-l)/2 e -Cv 



(«-l)/2 e -Cr, 



Here we bounded (An/£) e = [(An/l) l l An \ An < (An/E) E , 
using the fact that x~ x is an increasing function of x for 
x < e -1 , and that E/An — Cm/An\og{nk/m) < Cm/An 
is smaller than er 1 for C small enough. 

Finally we bound E^ +1 V 2 < A E and fc^- 1 )/ 2 < k E 
(which holds for m large enough), thus getting 

E 



Eq .E 



< 



ikA 



m 



-Cm 



If we take E = Cm/2\og(nkA/m), we get Pe ,e 1 < 
e -Cm/2^ wn j c h j s sm aller than (Bk/m)^ for a properly 
chosen constant B and k < m/logm (indeed k < me m 
would be enough for any e m { 0.) □ 



V. Analysis of multi-layer architectures and 

PROOF OF THEOREMQ] 

Proof. Let Pen- denote the probability that Z-th term in the 
g-ary expansion of x is decoded incorrectly by the decoder 
in Section G]LB] (i.e. that x(l) ^ x(l)) given that x(0), 
x(l — 1) have been correctly recovered. We will prove that 
Po r l = 0(n~ A ) for some A > 0. Since the multi-layer 
architecture involves at most C(logn)2 layers, this implies 
the thesis. Further, we shall consider only the first Lq layers, 
since it will be clear from the derivation below that the error 
probability is decreasing for the last L\ layers. 

Let x be the input. Since we are focusing on the l-th term 
in the q-ary expansion of the input, we will drop the index 
I, and take x E {0, . . . , q — 1}". This is just a vector whose 
entries are iid with distribution pi. 

The error probability P e r r is upper bounded by the prob- 
ability that x ^ T n (pi) plus the probability that there exists 
x' 7^ x with M l x' = M l x mod q. The first contribution 

is bounded by Lemma [4] and we can therefore neglect 

(i *) 

it. Denoting the second contribution as Pco , and writing 
E x , P for (respectively) expectation with respect to x and 
probability with respect to the matrices Hi, . . .H;, we have 
(matrix multiplications below are understood to be modulo 

q) 

P&*> = E x P{3x' eT n ( Pl )\{x} s.t.m l x' =M l x} 
t 

Qt ] = E x P{3x' G T n (pi) \ {x} s.t. 



tV = H*x, H'- V ^ H t_1 a;} 



->(0 



Since, I < L — O(logn), it is sufficient to show Q t 
0(n~ A ). In we can separate error events due to input 
x' such that d t = d(M t ~ 1 x', H t_1 a;) < E and the other ones. 
As a consequence Ql is upper bounded by 



EV = 



V} 



l l x' 



E X F {3x' eTnipi), s.t. l<d t <E, 
+E x P{3x' e T n (pi) s.t. E<d t , 
< P{3z s.t. ||z|| < E,M t z = 0} 

+ |T n ( P ,)|sup{P{H t * = 0} : ||*||o>-B} 



Here z is understood to be a m t _i dimensional vector with 
entries in {0, . . . ,q — 1}. 

Notice that (mt/kmt-i) 1 ^ > (S/k) 1 ^ > S. Next we 
choose E = C(q, A = S)m t / \og(mt-ik / rrit) with C(q, A) 
as in the statement of Lemma [6] As a consequence the first 
term above is upper bounded by 



, { — ) U^!--> 

rat 



-+2 



2 -£+2 



< C(log n) i n i 



where we used m t _i < m t /S and m t > n/(logn) 2 . The 
constant C that depends uniquely on q, k, S, but not on n. 

It remains to bound the second contribution, due to inputs 
x 1 with d(x',x) > E. Using Lemma [4] (to bound T n (pi)) 



and E] (to bound P{H t z = 0} for ||z|| > E) 
E^P {3a;' € T n (pi) s.t. E < d t , U l x' 



<q 



nhi+An 13 



(Ckn) 



q -i r i 



\^[l + {q-l)e- DkE/mt ~ 



X) 



By eventually enlarging the constant A (in a way that 
depends on q), we can get rid of the term (Cn)~s~ . By 
further using (1 + x) < q x / x °&i we can upper bound the 
above by fc 9_1//2 g* where 

$ = nhi + A{q)n - m t + A ' {q)m t e- D ^ kE ' mt 

with A'(q) = (q - l)/logq. Notice that kE/m t = 
C(q, 5)k/ log(fcm t _i/m t ) can be made arbitrarily large by 
taking k large enough. In particular, we can choose k*(q, 5) 
such that A'{q)e- D ^ kE ' mt < 5/3 for any k > fc*. For such 
k, and using the fact that m t > mi — [nhi + A(q)n"](l + 5) 



$ <nh t + A(q)n p - m ; (l - (5/3) < --8[nhi + A{q)n t 



Summing the various contributions, we obtain, for any k > 

K(q,5) 

Q[ l) < C(q,k,5)(\0gn)^- 2 n-^ +2 + k^q- S ( A ^ n " +nh 'y : 
which proves the thesis. □ 

VI. Some auxiliary results 

Proof: Lemma First consider Eq. (TT~0b - Let X\ be an 
integer random variable with distribution p, Xi(l) its l-th 
least significant q-ary digit and Z the indicator function on 
X X {1) > 0. From H(Xi(l)) = H(Z) + H{X l (l)\Z) it 
follows that, for p l = ¥{X X > q 1 }: 

hi < Pi \og q (q- 1) -pilogqPi - (1 -^)log 9 (l -pi) . 

Choosing q large enough so that p t < Aq~ e < 1/2 for all 
I > 1, we can upper bound —(1 — p ; ) log ? (l — p t ) by 2p ; , 
thus getting 

hi < 3pi -pilogqPi , 

which implies Eq. ( TTOb for p x < Aq~ le .) 

In order to prove Eq. (lilt , first notice that H(X\) = 
ff({X x (/)}) < Ei>o^(^i(0) whence h 2 (p) < 
J2i>o hi l°g2 Q- me same argument li2{p) > ho\og 2 q. 
The thesis follows by bounding J2i>a hi using Eq. ( fTOb . □ 

Proof: Lemma 0. The number of vectors with type 8 is 
upper bounded by q nH< - e >. Since there are at most (n + l) q 
distinct types, |T n (p*)| < q nH(p,)+nK n where 



K n = swp{H(9) - H(p«) : D(6\\p*) < n^} 



log g (n+l) g 



The bound H( 9)-H(p*) < ||0-p,||ilog(g/||0-p»||) and 
H0-p.ll < y/2D(6\\p.) [3]. 
Equation ( fT2l is just Sanov Theorem. □ 
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